Distributed Kalman Filter 
via Gaussian Belief Propagation 



Danny Bickson 
IBM Haifa Research Lab 
Mount Carmel, Haifa 31905, Israel 
Email: dannybi@il.ibm.com 



Ori Shental 

Center for Magnetic 
Recording Research 

UCSD 
9500 Gilman Drive 
La Jolla, CA 92093, USA 
Email: oshental@ucsd.edu 



Danny Dolev 
School of Computer Science 
and Engineering 
Hebrew University of Jerusalem 
Jerusalem 91904, Israel 
Email: dolev@cs.huji.ac.il 



Abstract — Recent result shows how to compute distribu- 
tively and efficiently the linear MMSE for the multiuser 
detection problem, using the Gaussian BP algorithm. In 
the current work, we extend this construction, and show 
that operating this algorithm twice on the matching inputs, 
has several interesting interpretations. First, we show 
equivalence to computing one iteration of the Kalman filter. 
Second, we show that the Kalman filter is a special case of 
the Gaussian information bottleneck algorithm, when the 
weight parameter j3 = 1. Third, we discuss the relation to 
the Affine-scaling interior-point method and show it is a 
special case of Kalman filter. 

Besides of the theoretical interest of this linking esti- 
mation, compression/clustering and optimization, we allow 
a single distributed implementation of those algorithms, 
which is a highly practical and important task in sensor 
and mobile ad-hoc networks. Application to numerous 
problem domains includes collaborative signal processing 
and distributed allocation of resources in a communication 
network. 

I. Introduction 

Recent work [1] shows how to compute effi- 
ciently and distributively the MMSE prediction for 
the multiuser detection problem, using the Gaussian 
Belief Propagation (GaBP) algorithm. The basic 
idea is to shift the problem from linear algebra 
domain into a probabilistic graphical model, solving 
an equivalent inference problem using the efficient 
belief propagation inference engine. [2] compares 
the empirical performance of the GaBP algorithm 
relative to other linear iterative algorithms, demon- 
strating faster convergence. [3] elaborates on the 
relation to solving systems of linear equations. 



In the present work, we propose to extend the pre- 
vious construction, and show, that by performing the 
MMSE computation twice on the matching inputs 
we are able to compute several algorithms. First, 
we reduce the discrete Kalman filter computation 
[4] to a matrix inversion problem and show how 
to solve it using the GaBP algorithm. We show 
that Kalman filter iteration which is composed from 
prediction and measurement steps can be computed 
by two consecutive MMSE predictions. Second, 
we explore the relation to Gaussian information 
bottleneck (GIB) [5] and show that Kalman filter is 
a special instance of the GIB algorithm, when the 
weight parameter (5 — 1. To the best of the authors 
knowledge, this is the first algorithmic link between 
the information bottleneck framework and linear dy- 
namical systems. Third, we discuss the connection 
to the Affine-scaling interior-point method and show 
it is an instance of the Kalman filter. 

Besides of the theoretical interest of linking com- 
pression, estimation and optimization together, our 
work is highly practical since it proposes a general 
framework for computing all of the above tasks 
distributively in a computer network. This result can 
have many applications in the fields of estimation, 
collaborative signal processing, distributed resource 
allocation etc. 

A closely related work is [6]. In this work, Frey et 
ah focus on the belief propagation algorithm (a.k.a 
sum-product algorithm) using factor graph topolo- 
gies. They show that the Kalman filter algorithm 
can be computed using belief propagation over a 



factor graph. In this contribution we extend their 
work in several directions. First, we extend the 
computation to vector variables (relative to scalar 
variables in Frey's work). Second, we use a different 
graphical model: an undirected graphical model 
which results in simpler update rules, where Frey 
uses factor-graph with two types of messages: factor 
to variables and variables to factors. Third and 
most important, we allow an efficient distributed 
calculation of the Kalman filter steps, where Frey's 
algorithm is centralized. 

Another related work is [7]. In this work the link 
between Kalman filter and linear programming is 
established. In this work we propose a new and 
different construction which ties the two algorithms. 

The structure of this paper is as follows. In 
Section |n] we describe the discrete Kalman filter. 
In Section [III] we outline the GIB algorithm and 
discuss its relation to the Kalman filter. Section 
[IV] presents the Affine- scaling interior-point method 
and compares it to the Kalman filter algorithm. 
Section |V] presents our novel construction for per- 
forming an efficient distributed computation of the 
three methods. 

II. Kalman Filter 

A. An Overview 

The Kalman filter is an efficient iterative al- 
gorithm to estimate the state of a discrete-time 
controlled process x E R" that is governed by the 
linear stochastic difference equation 0: 

x k = Axk-i + + w k -i, (1) 

with a measurement z E R m that is z k = 
Hx k + v k . The random variables w k and v k that rep- 
resent the process and measurement AWGN noise 
(respectively). p(w) ~ N(0,Q),p(v) ~ M(0,R). 
We further assume that the matrices A, H, B, Q, R 
are giverH. 

The discrete Kalman filter update equations are 
given by [4]: 

'in this paper, we assume there is no external input, namely x\ = 
Axk-i + Wk-i- However, our approach can be easily extended to 
support external inputs. 

2 Another possible extension is that the matrices A, H, B,Q, R 
change in time, in this paper we assume they are fixed. However, 
our approach can be generalized to this case as well. 



The prediction step: 

xl = Ar fc _i + Bu k -i, (2a) 
P k ~ = AP k ^A T + Q. (2b) 

The measurement step: 

K k = P~H T {HP-H T + R)-\ (3a) 
%k = K + K k (z k - Hx k ), (3b) 
P k = {I-K k H)P k -. (3c) 

where I is the identity matrix. 

The algorithm operates in rounds. In round k the 
estimates K k ,x k ,P k are computed, incorporating 
the (noisy) measurement z k obtained in this round. 
The output of the algorithm are the mean vector x k 
and the co variance matrix P k . 

B. New construction 

Our novel contribution is a new efficient dis- 
tributed algorithm for computing the Kalman filter. 
We begin by showing that the Kalman filter can 
be computed by inverting the following covariance 
matrix: 

/ -P k -i A \ 
E = ( A T Q H , (4) 
\ H T R ) 

and taking the lower right lxl block to be P k . 

The computation of E^ 1 can be done efficiently 
using recent advances in the field of Gaussian belief 
propagation [1], [3]. The intuition for our approach, 
is that the Kalman filter is composed of two steps. 
In the prediction step, given x k , we compute the 
MMSE prediction of x^ [6]. In the measurement 
step, we compute the MMSE prediction of x k+ \ 
given x^, the output of the prediction step. Each 
MMSE computation can be done using the GaBP 
algorithm [1]. The basic idea, is that given the joint 
Gaussian distribution p(x, y) with the covariance 

matrix C = I y xx y xy ), we can compute the 

\ ^yy J 

MMSE prediction 

y = argmaxp(y|x) oc Af(n y \ x , E^) , 
y 

where 

fi"y\x = (Eyj/ — Tjy X T Jxx Tj X y) Tjy X x , 

Ey|x = (Ej/j/ — Tjyx/Lj xx Sjej,) 



This in turn is equivalent to computing the Schur 
complement of the lower right block of the matrix 
C. In total, computing the MMSE prediction in 
Gaussian graphical model boils down to a computa- 
tion of a matrix inverse. In [3] we have shown that 
GaBP is an efficient iterative algorithm for solving a 
system of linear equations (or equivalently comput- 
ing a matrix inverse). In [1] we have shown that for 
the specific case of linear detection we can compute 
the MMSE estimator using the GaBP algorithm. 
Next, we show that performing two consecutive 
computations of the MMSE are equivalent to one 
iteration of the Kalman filter. 

Theorem 1: The lower right lxl block of 
the matrix inverse E^ 1 (eq. U), computed by two 
MMSE iterations, is equivalent to the computation 
of Pk done by one iteration of the Kalman filter 
algorithm. 

Proof of Theorem 1 is given in Appendix A. 

In Section |V] we explain how to utilize the 
above observation to an efficient distributed 
iterative algorithm for computing the Kalman filter. 

III. Gaussian Information Bottleneck 

Given the joint distribution of a source variable 
X and another relevance variable Y, Information 
bottleneck (IB) operates to compress X, while pre- 
serving information about Y [8], [9], using the 
following variational problem: 

min C : C = I(X; T) - (3I(T; Y) 

p(t\x) 

T represents the compressed representation of X 
via the conditional distributions p(t\x), while the 
information that T maintains on Y is captured by 
the distributionp(?/|i). (3 > is a lagrange multiplier 
which weights the tradeoff between minimizing 
the compression information and maximizing the 
relevant information. As (3 — > we are interested 
solely in compression, but all relevant information 
about Y is lost I(Y; T) = 0. When (3 -> oo where 
are focused on preservation of relevant information, 
in this case T is simply the distribution X and we 
obtain I(T;Y) = I(X;Y). The interesting cases 
are in between, when for finite values of (3 we are 



able to extract rather compressed representation of 
X while still maintaining a significant fraction of 
the original information about Y. 

An iterative algorithm for solving the IB problem 
is given in [9]: 



■exp(-(3D KL \p(y\x)\\p k (y\t)]),(5a) 
P k {t) = f x p(x)P k (t\x)dx, (5b) 

P\y\t) = p% / a P k (t\x)p(x, y)dx. (5c) 

where Z k+l is a normalization factor computed in 
round k + 1. 

The Gaussian information bottleneck (GIB) [5] 
deals with the special case where the underlying dis- 
tributions are Gaussian. In this case, the computed 
distribution pit) is Gaussian as well, represented by 
a linear transformation T k = A k X+^ where A k is a 
joint covariance matrix of X and T, ~ A/"(0, E^) 
is a multivariate Gaussian independent of X. The 
outputs of the algorithm are the covariance matrices 
representing the linear transformation T: A k , S^ fe . 

An iterative algorithm is derived by substituting 
Gaussian distributions into ©, resulting in the fol- 
lowing update rules: 

= (p^y-iP-l)^' 1 ), (6a) 
A k+1 = /ffi^+iE-j^CJ-E^E- 1 ). (6b) 



TABLE I 

Summary of notations in the GIB [5] paper vs. Kalman 

FILTER [4] 



GIB [5] 


Kalman [4] 


Kalman meaning 


^>X 


Po 


a-priori estimate error covariance 




Q 


Process AWGN noise 


St, 


R 


Measurement AWGN noise 


^>xy i ^>yx 


A, A T 


process state transformation matrix 


T A A T T 


H T ,H 


measurement transformation matrix 




Pk 


posterior error covariance in round k 


y 


Pk 


a-priori error covariance in round k 



Since the underlying graphical model of both 
algorithms (GIB and Kalman filter) is Markovian 
with Gaussian probabilities, it is interesting to ask 
what is the relation between them. In this work 
we show, that the Kalman filter posterior error 
covariance computation is a special case of the 
GIB algorithm when (3=1. Furthermore, we show 




(d) 

Fig. 1. Comparison of the different graphical models used, (a) 
Gaussian Information Bottleneck [5] (b) Kalman Filter (c) Frey's 
sum-product factor graph [6] (d) Our new construction. 



how to compute GIB using the Kalman filter when 
(3 > 1 (the case where < (3 < 1 is not interesting 
since it gives a degenerate solution where A k = 
[5].) Table H outlines the different notations used 
by both algorithms. 



P(y|t) 



P(t|x) 
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Prediction 
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ment 
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/ \ Translation 
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mapping 
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Fig. 2. Comparison of the schematic operation of the different 
algorithms, (a) iterative information bottleneck operation (b) Kalman 
filter operation (c) Affine-scaling operation. 



but affect the posterior mean £f. (eq. [3b]). Second, 
Kalman filter computes both posterior mean Xk and 
error covariance P k . The covariance computed 
by the GIB algorithm was shown to be identical 
to Pk when (3 = 1. The GIB algorithm does 
not compute the posterior mean, but computes an 
additional covariance A k (eq. l6b|). which is assumed 
known in the Kalman filter. 



Theorem 2: The GIB algorithm when /3 = 1 is 
equivalent to the Kalman filter algorithm. 
The proof is given in Appendix B. 

Theorem 3: The GIB algorithm when (3 > 1 can 
be computed by a modified Kalman filter iteration. 
The proof is given in Appendix C. 

There are some differences between the GIB 
algorithm and Kalman filter computation. First, the 
Kalman filter has input observations z k in each 
round. Note that the observations do not affect the 
posterior error covariance computation P k (eq. l3cl) . 



From the information theoretic perspective, our 
work extends the ideas presented in [10]. Predictive 
information is defined to be the mutual information 
between the past and the future of a time serias. In 
that sense, by using Theorem 2, Kalman filter can 
be thought of as a prediction of the future, which 
from the one hand compresses the information about 
past, and from the other hand maintains information 
about the present. 

The origins of similarity between the GIB algo- 
rithm and Kalman filter are rooted in the IB iterative 
algorithm: For computing (|5al ), we need to compute 
(I5bl5cl) in recursion, and vice versa. 



IV. Relation to the Affine-scaling 

ALGORITHM 

One of the most efficient interior point methods 
used for linear programming is the Affine-scaling 
algorithm [11]. It is known that the Kalman filter 
is linked to the Affine-scaling algorithm [7]. In this 
work we give an alternate proof, based on different 
construction, which shows that Affine-scaling is an 
instance of Kalman filter, which is an instance of 
GIB. This link between estimation and optimization 
allows for numerous applications. Furthermore, by 
providing a single distribute efficient implementa- 
tion of the GIB algorithm, we are able to solve 
numerous problems in communication networks. 

The linear programming problem in its canonical 
form is given by: 

minimize c T x (7a) 

subject to Ax = b, x > 0. (7b) 

where A e R nxp with rank{A} = p < n. We 
assume the problem is solvable with an optimal x*. 
We also assume that the problem is strictly feasible, 
in other words there exists x G W 1 that satisfies 
Ax = b and x > 0. 

The Affine-scaling algorithm [11] is summarized 
below. Assume x is an interior feasible point to 
dTbl) . Let D = diag(x ). The Affine-scaling is an 
iterative algorithm which computes a new feasible 
point that minimizes the cost function (f7ab : 

xi = x - -D 2 v (8) 

7 

where < a < 1 is the step size, r is the step 
direction. 

r = {c-A T w), (9a) 

w = (AD 2 A T )- 1 AD 2 c, (9b) 

7 = max(ejP_Dc). (9c) 

i 

Where is the i th unit vector and P is a projection 
matrix given by: 

P = I- DA T (AD 2 A T )- 1 AD. (10) 

The algorithm continues in rounds and is guaranteed 
to find an optimal solution in at most n rounds. 
In a nutshell, in each iteration, the Affine-scaling 
algorithm first performs an Affine-scaling with re- 
spect to the current solution point x* and obtains 



the direction of descent by projecting the gradient 
of the transformed cost function on the null space of 
the constraints set. The new solution is obtained by 
translating the current solution along the direction 
found and then mapping the result back into the 
original space [7]. This has interesting analogy for 
the two phases of the Kalman filter. 

Theorem 4: The Affine-scaling algorithm itera- 
tion is an instance of the Kalman filter algorithm 
iteration. 

Proof is given in Appendix D. 

V. Efficient distributed computation 

We have shown how to express the Kalman filter, 
Gaussian information bottleneck and Affine-scaling 
algorithms as a two step MMSE computation. Each 
step involves inverting a 2 x 2 block matrix. Recent 
result by Bickson and Shental et al. [1] show that 
the MMSE computation can be done efficiently and 
distributively using the Gaussian belief propagation 
algorithm. Because of space limitations the full 
algorithm is not reproduced here. 

The interested reader is referred to [1], [3] for 
a complete derivation of the GaBP update rules 
and convergence analysis. The GaBP algorithm is 
summarized in Table Hfl 

Regarding convergence, if it converges, GaBP is 
known to result in exact inference [12]. Determining 
the exact region of convergence and convergence 
rate remain open research problems. All that is 
known is a sufficient (but not necessary) condi- 
tion [13], [14] stating that GaBP converges when 
the spectral radius satisfies p(\Ix — A\) < 1, where 
A is first normalized s.t. the main diagonal contains 
ones. A stricter sufficient condition [12], determines 
that the matrix A must be diagonally dominant 
(i.e., \An\ > Ylij+i lAjl) Vi) in order for GaBP to 
converge. 

Regarding convergence speed, [15] shows that 
when converging, the algorithm converges in 
0(log(e) /log{^)) iterations, where e is the desired 
accuracy, and l/2<7<lisa parameter related 
to the inverted matrix. The computation overhead in 
each iteration is determined by the number of non- 
zero elements of the inverted matrix A. In practice, 
[16] demonstrates convergence of 5-10 rounds on 
sparse matrices with several millions of variables. 
[17] shows convergence of dense constraint matrices 



TABLE II 

Computing x = A^b via GaBP [3]. 



# 


Stage 


Operation 


1. 


Initialize 


Compute Pa = A {i and = bi/A ti . 
bet Fj-i = and jUjy = 0, vk i. 


z. 


Iterate 


Propagate P^ and /i^, Vfc 7^ ? such that ^ 0. 

Compute P Ai = Pa + EfceN(i)\i P « and = P^iPu^u + E feeN (i)V P ^ki)- 
Compute Pij = -AijP^Aji and % = -/^ '. I,,//, ; . 


3. 


Check 


If Pij and did not converge, return to #2. Else, continue to #4. 


4. 


Infer 


= Pii + EfcGN(i) 1 i i i = P-i (Pii^ii + E/fceN(i) Pki^ki) ■ 


5. 


Output 





of size up to 150, 000 x 150, 000 in 6 rounds, 
where the algorithm is run in parallel using 1,024 
CPUs. Empirical comparison with other iterative 
algorithms is given in [2]. 

VI. Example application 

The TransFab software package is a distributed 
middleware developed in IBM Haifa Labs, which 
supports real time forwarding of message streams, 
providing quality of service guarantees. We plan 
to use our distributed Kalman filter algorithm for 
online monitoring of software resources and perfor- 
mance. On each second each node records a vec- 
tor of performance parameters like memory usage, 
CPU usage, current bandwidth, queue sizes etc. The 
nodes execute the distributed Kalman filter algo- 
rithm on the background. Figure Oplots a co variance 
matrix of running an experiment using two TransFab 
nodes propagating data. The covariance matrix is 
used as an input the Kalman filter algorithm. Yellow 
sections show high correlation between measured 
parameters. Initial results are encouraging, we plan 
to report them using a future contribution. 

VII. Conclusion 

In this work we have linked together several 
different algorithms from the the fields of estimation 
(Kalman filter), clustering/compression (Gaussian 
information bottleneck) and optimization (Affine- 
scaling interior-point method). Besides of the theo- 
retical interest in linking those different domains, we 
are motivated by practical problems in communica- 
tion networks. To this end, we propose an efficient 
distributed iterative algorithm, the Gaussian belief 




Fig. 3. Covariance matrix which represents vector data captured 
from two Transfab nodes. 



propagation algorithm, to be used for efficiently 
solving these problems. 
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Appendix A 

Proof: We prove that inverting the matrix E 
(eq. SI) is equivalent to one iteration of the Kalman 
filter for computing P k . 

We start from the matrix E and show that P k 
can be computed in recursion using the Schur com- 
plement formula: 



applied to the 2x2 upper left submatrix of E, where 

D = Q,C = A T , B = A,A = P fc _! we get: 



Pi 



Q + A T P k _, A . 



Now we compute recursively the Schur comple- 
ment of lower right 2x2 submatrix of the matrix 
E using the matrix inversion lemma: 

A' 1 + A~ l B(D - CA- 1 B)- 1 CA- 1 

where A' 1 = P k ,B = H T ,C = H,D = Q. In 
total we get: 

A- 1 A- 1 B D c A- 1 B c A- 1 



J3aj 



(12) 



(/ - P k H T (HP k H T + R)- 1 H)P k = 
= \l - K k H)P k - = P k 

■ 

Appendix B 
Proof: Looking at [5, §39], when (3 = 1 we get 



J tk\y 



MMSE 

A 



^t k ^tky^y^ytk — 
[5, §38b] 

, A V 

+ BT ^y\t k B = 

[5, §34] [5, §34] 



-1 



[5, §33] 



[5, §33] 



A T E X A + S c + (A T Y, X A + S ? ) A T Z xy - 
[5, §33] 

' ~ " ? 

'^yltk^yxA (A Tj x A + Sg) = 

A T T, X A + E s + (A T T, X A + ^)A T Y> xy - 
MMSE 



D - CA- X B 



■ (Ey + E ytk Il tk Y; tk y) E yx A(A T E x A + S^) T 
A t Yj x A + S c + (A T E X A + E € )A T 5V 
[5,J5] ^ ([5 , J5] [5 , §5] 

[Yjy + A Yjy X {^AYj x A + S^) Yj X yA )■ 



(11) ■X yx A(A T Z x A + Zz) T . 



Now we show this formulation is equivalent to 
the Kalman filter with the following notations: 

= ^xA + Sg) , H = A Syx; R = ^y, 



-Pfc-i — Q — E^. 



Substituting we get: 



1 k 



(A T E X A + ^)+\a t e x a + Zs)A T Z xy 
R " - Pf ~ ^ T ~, 

■( + A Tjy X (A S X A + S^) Yi xy A)- 



■ Yiy X A (A + S^) . 



Which is equivalent to (fT2|) . Now we can apply 
Theorem 1 and get the desired result. ■ 



Appendix C 

Proof: In the case where /3 > 1, the MAP co- 
variance matrix as computed by the GIB algorithm 
is: 



EfiH-i = ^t k \y + (1 - P)^ 



(13) 



This is a weighted average of two covariance ma- 
trices. E tfc is computed at the first phase of the 
algorithm (equivalent to the prediction phase in 
Kalman literature), and £t fe ij/ is computed in the 
second phase of the algorithm (measurement phase). 
At the end of the Kalman iteration, we simply 
compute the weighted average of the two matrices 
to get (fT3l . Finally, we compute Ak+i using (eq.| 
by substituting the modified S^ fc . 



(8) 



a 



Xi = x 



x - 



-D z r 

7 



x « 

maxe,PZ)c 



D 2 r 



max, et (I ~ DA (AD A ) AD) Dc 



D 2 r 



x 




i e i (I-DA T (AD 2 A T )- 1 AD)Dc 



_ a p 2 (c-A T (AD 2 A T )- 1 AD 2 c) 

X ° max, e i (I-DA T (AD 2 A r )AD)- 1 Dc ~ 
aD(I —DA T (AD 2 A T )~ 1 AD)Dc 
X ° max, e t (I-DA T (AD 2 A r )~ 1 AD)Dc 

Looking at the numerator and using the Schur com- 
plement formula (fTT)) with the following notations: 

A 4 (AD 2 A T )-\B 4 AD,C 4 DA T ,D 4 / 

' AD 2 A T AD 



we get the following matrix: 



DA 1 



Again, the upper left block is a Schur complement 

A = 0,B ■ 

ing matrix: 



A = 0,B = AD, C = DA T , D = I of the follow 



DA 1 I 
block matrix of the form: 



AD \ , . , n 

T T .In total with get a 3 x 3 



AD 
DA T I AD 
DA T I 
Note that the divisor is a scalar which affects the 
scaling of the step size. 

Using Theorem 1, we get a computation of 
Kalman filter with the following parameters: 
A,H = AD, Q = I, R = I, P = 0. This has an 
interesting interpretation in the context of Kalman 
filter: both prediction and measurement transforma- 
tion are identical and equal AD. The noise variance 
of both transformations are Gaussian variables with 
prior oc X( 0,1). ■ 



Appendix D 



Proof: We start by expanding the Affine- 
scaling update rule: 



